Analysis of synthetic cellular barcodes in the genome and transcriptome with BARtab and bartools

Summary Cellular barcoding is a lineage-tracing methodology that couples heritable synthetic barcodes to high-throughput sequencing, enabling the accurate tracing of cell lineages across a range of biological contexts. Recent studies have extended these methods by incorporating lineage information into single-cell or spatial transcriptomics readouts. Leveraging the rich biological information within these datasets requires dedicated computational tools for dataset pre-processing and analysis. Here, we present BARtab, a portable and scalable Nextflow pipeline, and bartools, an open-source R package, designed to provide an integrated end-to-end cellular barcoding analysis toolkit. BARtab and bartools contain methods to simplify the extraction, quality control, analysis, and visualization of lineage barcodes from population-level, single-cell, and spatial transcriptomics experiments. We showcase the utility of our integrated BARtab and bartools workflow via the analysis of exemplar bulk, single-cell, and spatial transcriptomics experiments containing cellular barcoding information.


Figure S1 .
Figure S1.BARtab performance comparison -population-level data.Related to Figure 2. A) Venn diagram of total barcodes detected by BARtab, pycashier and TimeMachine (reads mode) across the 22 samples of the Goyal et al. dataset after filtering barcodes below 0.001% within a sample.B) Pearson and Spearman correlation of BARtab barcode quantification compared to TimeMachine (reads mode).Boxplots indicate mean and interquartile range with whiskers extending 1.5x the IQR.All samples are shown as individual points.C) Exemplar scatter plots for two samples from the Goyal et al. dataset showing total counts per barcode from BARtab and TimeMachine, Spearman's rank correlation coefficient in parentheses (reads mode).

Figure S2 .
Figure S2.Additional quality control metrics for the dose escalation dataset.Related to Figure 2. A) Total number of read counts per sample for the dose escalation dataset.Poor quality samples identified using 5 th percentile-based outlier detection are highlighted in red.B) Total number of lineage barcodes detected per sample post quality control and filtering steps for the dose escalation dataset.C) Pairwise Pearson correlation heatmap of biological replicate samples (indicated by TR1/2) for AraC and DMSO treatment groups in the dose escalation dataset.TR = technical replicate.TP = time point.TP0 = time point 0 / baseline sample.

Figure S3 .
Figure S3.Additional global barcode abundance analyses for the dose escalation dataset.Related to Figure 3. A) Timeseries plot of the most abundant 50 barcodes across all samples within biological replicate 1 from the dose escalation dataset.B) Sample-sample pairwise Spearman correlation matrix of the dose escalation dataset, hierarchically clustered according to sample similarity.C) Stacked histogram of proportional barcode abundance for each sample in the dose escalation dataset.Treatment and replicate groups are indicated by the rows above the plot, the timepoint is indicated on the x-axis.The top 10 most abundant clones across the entire dataset are indicated.D) Boxplot of Pearson correlations between matched samples from replicate 1 and replicate 2 of the dose-escalation dataset for each of the three treatment groups.Boxplots show the mean and inter-quartile range (IQR).Whiskers extend 1.5x the IQR.Points indicate Pearson correlation values for individual pairs of samples.TP = time point.TP0 = time point 0 / baseline sample.

Figure S4 .
Figure S4.Individual barcode-level analysis.Related to Figure 3. Violin plots of log10 transformed counts per million (CPM) for selected barcodes from the dose escalation dataset predominant in the (A) vehicle (DMSO) condition, (B) IBET treatment condition and, (C) AraC treatment condition.Inset boxplots show the mean and inter-quartile range (IQR).Whiskers extend 1.5x the IQR.Points indicate CPM values for individual samples.

Figure S5 .
Figure S5.BARtab performance comparison -single cell data.Related to Figure 4. A) Comparison of clone sizes from single cell barcode annotation results by FateMap vs. BARtab.Samples 1-4 from the Goyal et al. 2023 FM02 dataset are shown.BARtab results are filtered for UMI threshold = 5.FateMap was run according to published parameters (UMI threshold = 15).Pearson correlation coefficient in parentheses, dashed line is x=y.B) Total number of cells annotated with single lineage barcodes by FateMap or BARtab across a range of UMI thresholds for each of the four samples in the Goyal et al. 2023 FM02 dataset.FateMap was run according to published parameters.BARtab was run with (red line) or without (green line) UMI sequence error correction.

Figure S6 .
Figure S6.The plotClusterEnrichment function in bartools is agnostic to the grouping variable.Related to Figure 4. Hypergeometric testing for enrichment of cell cycle label G1 (A) and G2M (B) for cells within each Louvain cluster in the single-cell dataset.C) UMAP visualisation of the single-cell dataset with cells within each cell cycle phase highlighted.